- Friday, September 27, 2024
On September 26, 2024, the PostgreSQL Global Development Group announced the release of PostgreSQL 17, marking a significant advancement in the capabilities of this open-source database system. This latest version builds upon decades of development, enhancing performance and scalability to meet the evolving demands of data access and storage. PostgreSQL 17 introduces substantial performance improvements across various aspects of database management. A key enhancement is the revamped memory management for the vacuum process, which now uses up to 20 times less memory, thereby speeding up operations and freeing up resources for other workloads. The I/O layer has also been optimized, allowing for up to double the write throughput in high concurrency scenarios, thanks to advancements in write-ahead log processing. Additionally, the new streaming I/O interface accelerates sequential scans and updates planner statistics more efficiently. The release also brings notable improvements to query execution. Queries utilizing IN clauses with B-tree indexes will see enhanced performance, while BRIN indexes can now be built in parallel. Other optimizations include better handling of NOT NULL constraints and improvements in processing common table expressions. The introduction of more SIMD support, particularly with AVX-512 for the bit_count function, further accelerates computational tasks. For developers, PostgreSQL 17 expands its JSON capabilities by implementing the SQL/JSON standard, introducing the JSON_TABLE command, which allows for the conversion of JSON data into standard PostgreSQL tables. This version also enhances the MERGE command for conditional updates and improves bulk loading and data exporting processes, achieving up to double the performance when exporting large rows. The release enhances logical replication, which is crucial for real-time data streaming. Users can now upgrade to PostgreSQL 17 without needing to drop logical replication slots, simplifying the upgrade process. Failover control for logical replication has been added, increasing resilience in high availability environments. Security and operational management features have also been improved. PostgreSQL 17 introduces a new TLS option for direct handshakes and a predefined role for maintenance operations. The backup utility now supports incremental backups, and the pg_dump utility includes a new filtering option for generating dump files. Monitoring and analysis capabilities have been enhanced, providing more insights into database performance and activity. Overall, PostgreSQL 17 represents a significant step forward in database technology, offering a robust set of features that cater to both new and existing workloads. The release reflects the ongoing commitment of the global open-source community to enhance the PostgreSQL platform, ensuring it remains a leading choice for organizations of all sizes.
- Friday, September 27, 2024
On September 26, 2024, the PostgreSQL Global Development Group announced the release of PostgreSQL 17, marking a significant advancement in the capabilities of this open-source database system. This latest version builds upon decades of development, enhancing performance and scalability to meet the evolving demands of data access and storage. PostgreSQL 17 introduces substantial performance improvements across various aspects of database management. A key enhancement is the revamped memory management for the vacuum process, which now uses up to 20 times less memory, thereby speeding up operations and freeing up resources for other workloads. The I/O layer has also been optimized, allowing for up to double the write throughput in high concurrency scenarios, thanks to advancements in write-ahead log processing. Additionally, the new streaming I/O interface accelerates sequential scans and updates planner statistics more efficiently. The release also brings notable improvements to query execution. Queries utilizing IN clauses with B-tree indexes will see enhanced performance, and BRIN indexes can now be built in parallel. Other optimizations include better handling of NOT NULL constraints and improvements in processing common table expressions. The introduction of more SIMD support, particularly with AVX-512 for the bit_count function, further accelerates computational tasks. For developers, PostgreSQL 17 expands its JSON capabilities by implementing the SQL/JSON standard, including the new JSON_TABLE command, which allows for the conversion of JSON data into standard PostgreSQL tables. Additional features such as enhanced MERGE capabilities, improved bulk loading and exporting performance, and better management of partitioned tables and remote data instances are also included. Logical replication has been enhanced to facilitate high availability and simplify major version upgrades. Users can now retain logical replication slots during upgrades, eliminating the need for data resynchronization. The introduction of failover control for logical replication and the pg_createsubscriber command-line tool further bolster the resilience and flexibility of data management. Security and operational management have also seen improvements. PostgreSQL 17 introduces a new TLS option for direct handshakes and a predefined role for maintenance operations. The backup utility now supports incremental backups, and the pg_dump utility has been enhanced with a filtering option for more selective data exports. Monitoring features have been upgraded, providing deeper insights into database performance and session activity. Overall, PostgreSQL 17 represents a significant step forward in database technology, offering a robust set of features that cater to both new and existing workloads. The release underscores PostgreSQL's commitment to continuous improvement and its position as a leading open-source relational database system, supported by a vibrant global community.
- Wednesday, April 17, 2024
PostgreSQL's query optimizer has improved massively over the past decade. Using the Join Order Benchmark (JOB), this author shows that tail latency has been nearly halved between PostgreSQL versions 8 and 16, with each major version offering an average 15% performance increase. One of the best decisions teams can make to make their database query speeds faster is to simply keep their Postgres instances up to date.
- Monday, July 8, 2024
Pongo is Mongo on Postgres with strong consistency benefits. It treats PostgreSQL as a document database with JSONB support, adding significant performance and storage efficiency. Pongo takes the MongoDB API and translates it to native PostgreSQL queries. Using JSONB means that data is preparsed, allowing faster read and write operations. JSONB retains the flexibility of storing semi-structured data while allowing users to take advantage of PostgreSQL's robust querying capabilities.
- Monday, June 17, 2024
PostgreSQL with the pgvector extension offers an efficient way to store and query embeddings. It offers simplified querying, data consistency, and better performance compared to using separate databases for relational and vector data.
- pgvector, a PostgreSQL extension, achieves a 150x speedup in index build times through optimization.Thursday, May 2, 2024
The PostgreSQL extension pgvector has sped up over 150x this past year in its index build times. This is due to binary quantization methods, which reduces index sizes. New indexing methods and CPU-specific SIMD acceleration also helped increase query throughput and reduced latency.
- Thursday, May 16, 2024
This developer discovered a significant performance issue in a database query used for indexing posts in their application Mattermost. The query was initially slow due to too much filtering, but was sped up by using PostgreSQL's row constructor comparisons. To help find this speed boost, the developer used the BUFFERS option in EXPLAIN statements for detailed insights and prioritized Index Cond over Filter for efficient queries.
- Monday, August 26, 2024
PostgreSQL can be used as a search engine. Combining full-text search, semantic search with pgvector and fuzzy matching with pg_trgm makes PostgreSQL a good-enough search engine for a majority of use cases. This article goes into more advanced techniques to personalized search experiences, adjust for document length, debug rankings, and more.
- Monday, August 19, 2024
Postgres today is powerful enough to be the default choice for new applications requiring persistent data storage. NoSQL databases like DynamoDB, Cassandra, and MongoDB are not recommended for applications requiring high scalability and specific access patterns because data modeling gets too complex and analytics is tough. This article goes through other alternatives, like Oracle DB and Kafka, to show how Postgres is better.
- Wednesday, April 3, 2024
PostgreSQL would be easier to develop with if it had versioned schema, better online schema migrations, and declarative state-based migrations.
- Friday, September 6, 2024
PostgreSQL users should stop using the "serial" data type and switch to "identity" columns instead. There are several issues with "serial," including its lack of integrity guarantees, awkward ergonomics, and non-compliance with SQL standards. "Identity" columns, on the other hand, offer better safety, easier management, and align with SQL standards.
- Tuesday, July 23, 2024
Running PostgreSQL for others requires additional steps compared to running it for yourself, such as installing extensions, creating server certificates, configuring settings, and creating DNS records. Faster provisioning happens thanks to optimizations like using a baked OS image, parallelizing steps, and creating a pool of pre-provisioned databases. High Availability involves provisioning primary and standby databases, regular health checks, and ensuring proper fencing of the primary in case of failure.
- Monday, July 15, 2024
PLV8 is a trusted Javascript language extension for PostgreSQL. It can be used for stored procedures, triggers, etc.
- Wednesday, July 10, 2024
The `pgvector-node` library provides Node.js and TypeScript support for integrating vector operations with PostgreSQL across multiple database libraries. It allows the creation of tables with vector fields, insertion of vectors, and retrieval of nearest neighbors using various distance metrics. The library also supports creating approximate indexes for efficient vector similarity searches.
- Rails 7.2 introduces performance improvements and better defaults, enhancing development experience.Friday, August 16, 2024
Rails 7.2 has better production defaults, performance boosts with YJIT enabled by default, optimized Puma settings, and easier setup with pre-configured development containers.
- Tuesday, August 13, 2024
Supabase has launched postgres.new, an in-browser Postgres sandbox with AI assistance. This tool utilizes PGlite, a WASM version of Postgres, allowing users to spin up databases directly in their browser. postgres.new also has AI-powered features, such as drag-and-drop CSV import, report generation, charting, and ER diagram creation.
- Thursday, May 30, 2024
An MVP of serverless Postgres using Oriole, Fly Machines, and Tigris for S3 Storage.
- Wednesday, August 7, 2024
This article compares different full text search (FTS) options for Postgres databases, focusing on Elasticsearch and Postgres' native FTS. While Postgres FTS is simple and real-time, it lacks features and performs poorly on large datasets. Elasticsearch requires ETL pipelines, leading to data freshness issues and operational overhead. The article introduces and compares alternative search engines like Algolia, Meilisearch, ParadeDB, and Typesense.
- Monday, July 15, 2024
As Notion grew exponentially, it had to build a scalable data lake. Its solution involves incrementally ingesting updated data from Postgres to Kafka, then using Hudi to write to S3 for processing. Spark is used for complex tasks like tree traversal and denormalization. This approach has resulted in cost savings, improved data freshness, and has unlocked new possibilities for AI and search features.
- Friday, May 10, 2024
Postgres Message Queue is a lightweight, open-source message queue.
- Monday, June 3, 2024
This article describes a pattern for geographically distributing PostgreSQL databases for multi-tenant applications using only standard PostgreSQL functionality. The pattern involves separating per-tenant data from control plane data, placing tenant data in the nearest region, creating a global view using Foreign Data Wrappers, and partitioning, while keeping authentication and control plane data centralized. This approach lowers latencies, complies with data residency laws, and allows edge computing while maintaining most PostgreSQL features and ACID guarantees within tenants.
- Friday, September 20, 2024
It's better to use identity columns instead of using the serial data type in Postgres. This is because `serial` has several issues, like permission complexities, a lack of integrity guarantees, and awkward ergonomics. Identity columns provide a better way to manage auto-incrementing primary keys and are also compliant with the SQL standard.
- Tuesday, April 9, 2024
Distributed SQLite databases sacrifice consistency, transactions, and scalability. Traditional databases like PostgreSQL, paired with effective HTTP caching for speed, are better choices than using distributed SQLite. The upside to SQLite databases is that they are really fast, but at some point, the maintenance overhead outweighs the speed benefits.
- Wednesday, September 25, 2024
Postgres Write-Ahead Logs (WAL) are needed for logical replication in Postgres. WAL works by storing each state change as a command in an append-only file before the change is actually made to the database, allowing for recovery from the last checkpoint in case of a crash. WAL offers various configurable parameters like `wal_level`, `fsync`, `wal_buffers`, and `checkpoint_flush_after` to optimize performance and control data retention.
- Monday, August 5, 2024
ClickHouse has acquired PeerDB, a company focused on cost-effective Postgres replication and change data capture. PeerDB offers speed improvements and a number of specialized capabilities that ClickHouse didn't previously offer. Its open source components will remain open source without any change to their licenses and ClickHouse will also open source the production-grade Helm charts for PeerDB's enterprise offering. Existing commercial customers will be able to use the PeerDB Cloud service until July 24 next year.
- Monday, March 11, 2024
Databases often focus excessively on benchmark performance, overlooking the fact that a subjectively better user experience is often more important. The rate at which a database improves, ease of use, and how it integrates into existing workflows are all factors that can be more important when choosing a database over just raw performance. Focusing on a streamlined user experience that empowers quick analysis can sometimes offer a better edge than single-metric performance gains.
- Monday, April 15, 2024
Supabase's Index Advisor is a PostgreSQL extension that recommends indexes to improve query performance.
- Monday, April 8, 2024
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.